Finding the Number of Clusters using Visual Validation VAT Algorithm
نویسندگان
چکیده
Clustering is the process of combining a set of data in such a way that data in the same group are more similar to each other than the groups (clusters). K-Means is an algorithm for widely used in clustering techniques. But in this algorithm some of the issues are determined i.e. K-value selected by user is the main disadvantage. To overcome the drawback visual methods such as the VAT algorithm generally used for cluster analysis, also it is used to obtain the k-value prior to clustering. But the estimated result does not match with the true (but unknown) value in many cases. Then Spectral VAT algorithm was implemented. This spec-VAT algorithm is more efficient than VAT algorithm for complex data sets. The Spec-VAT based algorithms such as A Spec-VAT, P Spec-VAT and E Spec-VAT is also used to find out the cluster value efficiently. But the range of k value is either directly or indirectly given to spectral based VAT algorithms. In this paper we propose direct visual validation method and divergence matrix. In this proposed work the value of k or the range of k is neither directly nor indirectly specified by the users. Instead of k value, we propose a new method of comparing objects and from that result. We choose an object which is closer than other object, From the VVAT (Visual Validation VAT) algorithm the experimental result shows that the proposed algorithm is much better than the other algorithms. Keyword-VAT algorithm, visual validation, divergence matrix, VVAT algorithm
منابع مشابه
An Efficient Visual Analysis Method for Cluster Tendency Evaluation, Data Partitioning and Internal Cluster Validation
Visual methods have been extensively studied and performed in cluster data analysis. Given a pairwise dissimilarity matrix D of a set of n objects, visual methods such as Enhanced-Visual Assessment Tendency (E-VAT) algorithm generally represent D as an n × n image I(D) where the objects are reordered to expose the hidden cluster structure as dark blocks along the diagonal of the image. A major ...
متن کاملA New Implementation of the co-VAT Algorithm for Visual Assessment of Clusters in Rectangular Relational Data
This paper presents a new implementation of the co-VAT algorithm. We assume we have an m× n matrix D, where the elements of D are pair-wise dissimilarities betweenm row objectsOr and n column objectsOc. The union of these disjoint sets are (N = m + n) objects O. Clustering tendency assessment is the process by which a data set is analyzed to determine the number(s) of clusters present. In 2007,...
متن کاملA Comparative study of Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count Extraction
One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data prior to clustering. In this paper, we implement a new method for determining the number of clusters called Extended Dark Block Extraction (EDBE), which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set. Its basic steps include 1) Generatin...
متن کاملA New Approach in Strategy Formulation using Clustering Algorithm: An Instance in a Service Company
The ever severe dynamic competitive environment has led to increasing complexity of strategic decision making in giant organizations. Strategy formulation is one of basic processes in achieving long range goals. Since, in ordinary methods considering all factors and their significance in accomplishing individual goals are almost impossible. Here, a new approach based on clustering method is pro...
متن کاملParallel Visual Assessment of Cluster Tendency on GPU
Determining the number of clusters in a data set is a critical issue in cluster analysis. The Visual Assessment of (cluster) Tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex data sets. However, VAT can be computationally expensive for large data sets due to its O N2 ð Þ time complexity....
متن کامل